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Accumulating evidence indicates that MERS-CoV originated from bat coronaviruses (BatCoVs). Previously, we 
demonstrated that both MERS-CoV and BatCoV HKU4 use CD26 as a receptor, but how the BatCoVs evolved to 
bind CD26 is an intriguing question. Here, we solved the crystal structure of the SI subunit C-terminal domain 
of HKU5 (HKU5-CTD), another BatCoV that is phylogenetically related to MERS-CoV but cannot bind to CD26. 
We observed that the conserved core subdomain and those of other betacoronaviruses (betaCoVs) have a similar 
topology of the external subdomain, indicating the same ancestor of lineage C betaCoVs. However, two deletions 
in two respective loops located in HKU5-CTD result in conformational variations in CD26-binding interface and 
are responsible for the non-binding of HKU5-CTD to CD26. Combined with sequence variation in the HKU5- 
CTD receptor binding interface, we propose the necessity for surveilling the mutation in BatCoV HKU5 spike 
protein in case of bat-to-human interspecies transmission. 


1. Introduction 

Coronaviruses (CoVs) are spherical enveloped viruses with single 
positive-strand RNA genomes of ~30 kb in length, which is the largest 
among RNA viruses (Saif, 1993). CoVs are divided into four genera: 
alpha-, beta-, gamma-, and deltaCoVs (de Groot et al., 2013). BetaCoVs 
are further subdivided into four lineages/subgroups: A, B, C, and D 
(Chan et al., 2015). To date, both alpha- and betaCoVs are found to 
infect humans (Lu et al., 2015), causing subclinical or very mild 
symptoms and accounting for 10-15% of common colds (Heikkinen 
and Jarvinen, 2003). In addition, CoVs can also be life-threatening and 
have pandemic potential. The epidemic of severe acute respiratory 
syndrome coronavirus (SARS-CoV), which belongs to lineage B of the 
betaCoVs, originated in southern China in 2002 and spread to 28 


countries, infecting over 8000 and leading to almost 800 related deaths 
(WHO, 2004). The outbreak of MERS-CoV, a member of the lineage C 
betaCoVs (Cotten et al., 2013; Zaki et al., 2012), has caused 1832 
laboratory-confirmed cases since 2012, including 651 related deaths as 
of Nov. 28, 2016 (WHO, 2016). Unlike the SARS-CoV, which suddenly 
disappeared after a massive global disease control effort, especially in 
China, the number of MERS-CoV infections is still on the rise. 

Mounting evidence indicates that CoVs circulating in bats 
(BatCoVs) are the gene sources of alphaCoVs and betaCoVs (W. Li 
et al., 2005; Woo et al., 2012), including SARS-CoV (Ge et al., 2013; 
Lau et al., 2005; W. Li et al., 2005). The data also underscore that bats 
are the likely natural reservoir for MERS-CoV (Annan et al., 2013; 
Ithete et al., 2013; Memish et al., 2013; Wang et al., 2014; Yang et al., 
2014). For instance, viral gene fragments identical or quite similar to 
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those of MERS-CoV have been reported in bats (Annan et al., 2013; 
Ithete et al., 2013; Memish et al., 2013). Moreover, parallel studies 
from our group and others show that BatCoV HKU4, grouped in 
lineage C with MERS-CoV, can also use human CD26 (hCD26; the 
receptor of MERS-CoV) for viral entry (Wang et al., 2014; Yang et al., 
2014). In other words, two members in lineage C use the same human 
receptor. One has caused a human infection epidemic (MERS-CoV), 
and the other can utilize the same receptor (BatCoV HKU4) and has 
potential to infect humans. This highlights the necessity of surveillance 
for lineage C betaCoVs, including BatCoV HKU5, which was first 
sequenced in 2006 in Japanese pipistrelles ( Pipistrellus abramus ) 
(Woo et al., 2006) and is circulating in bats (Lau et al., 2013). Whether 
the virus has the potential to bypass the bat-human barrier needs to be 
evaluated. 

CoV infections initiate with the virus binding to the host receptor. 
The envelope-interspersed trimeric spike (S) protein plays a pivotal 
role in this process. S is further divided into two parts: SI, responsible 
for receptor binding, and S2, which initiates fusion (Belouzard et al., 
2012; Dai et al., 2016; Kielian and Rey, 2006). SI contains two 
relatively independent structures named the N-terminal domain 
(NTD) and C-terminal domain (CTD) based on their position. Most 
betaCoVs use the CTD as the receptor-binding domain (RBD/CTD) 
except mouse hepatitis virus (MHV), which uses the NTD to bind the 
cellular receptor carcinoembryonic-antigen-related cell-adhesion mo¬ 
lecule 1 (CEACAM1) (Dai et al., 2016; Du et al., 2009; F. Li et al., 2005; 
Lu et al., 2013; Peng et al., 2011). Two of the RBD/CTDs in lineage C 
betaCoVs (MERS-RBD/CTD and HKU4-RBD/CTD) bind to the same 
human receptor CD26 (hCD26) to initiate infection, and the two 
domains share high sequences identities (55%) in addition to high 
structural similarities, with a root mean square deviation (rmsd) of 
1.114 (193 Ca) (Lu et al., 2013; Wang et al., 2014). Despite the similar 
sequence identities between HKU5-CTD and MERS-RBD/CTD (54%) 
or HKU4-RBD/CTD (57%), no detectable binding was found between 
HKU5-CTD and hCD26. The structural basis for this variation remains 
to be elucidated. 

In this study, we determined the structure of HKU5-CTD. Similar to 
other solved structures, HKU5-CTD contains two subdomains: the core 
subdomain homologous to other CTDs in betaCoVs and the external 
subdomain, which resembles MERS-RBD/CTD and HKU4-RBD/CTD, 
indicating conservation of the external domain in lineage C. However, 
two deletions in HKU5-CTD lead to structural shifts in the hCD26- 
interaction interface and thereby result in its inability to bind this 
receptor. Our results suggest that the characteristic insertions in Pc4 
and Pc5 among different lineages in betaCoVs result in different 
receptor engagement, thereby contributing for potential interspecies 
transmission. 

2. Results 

2.1. Overall structure of the HKU5-CTD 

We first characterized the S protein of BatCoV HKU5 through 
bioinformatics analysis. BatCoV HKU5 S is composed of 1352 amino 
acids and exhibits typical features of CoVs S protein, including the 
predicted hydrophobic residue-rich HR1 and HR2 motifs and a similar 
concentration of hydrophobic amino acids to SARS-CoV fusion peptide 
(FP), internal fusion peptide (IFP), and pre transmembrane domain 
(PTM) (Gao et al., 2013; Mahajan and Bhattacharjya, 2015; Xu et al., 
2004; Zhu et al., 2004). Like MERS-CoV S protein, a furin-like protease 
recognition motif is predicted at position R745/A746 (S1/S2), which 
separates the SI and S2 subunits (Millet and Whittaker, 2014). In 
addition, a second furin cleavage site can be found at R884/S885, 
which resembles S2' in the MERS-CoV S protein (Millet and Whittaker, 
2014), indicating that the priming process of BatCoV HKU5 S in 
human cells probably occur in the same way like MERS-CoV (Fig. 1A). 
Because most betaCoVs use their CTD to bind their respective 
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receptors, we next focused on the evolutionary relationships of the 
CTDs. Consistent with the phylogenetic relationships, HKU5-CTD, 
HKU4-RBD/CTD, and MERS-RBD/CTD are grouped in one branch 
representing lineage C, while HKU1-CTD and MHV-CTD cluster 
together in lineage A. HKU9-CTD, a member in lineage D, is 
phylogenetically more related to SARS-RBD/CTD, which belongs to 
lineage B (Fig. IB). 

The HKU5-CTD was then purified, crystallized and the structure 
was successfully determined at a resolution of 2.1 A, with clear electron 
densities tracing from Q389 to Q586. The structure, solved through the 
molecular replacement method, contains a single molecule in the 
crystallographic asymmetric unit, with an R work of 0.2160 and an 
Rfree of 0.2585 (Table 1). Like the other solved CTD structures of 
betaCoVs (Huang et al., 2016; Kirchdoerfer et al., 2016; F. Li et al., 
2005; Lu et al., 2013; Walls et al., 2016; Wang et al., 2013, 2014), 
HKU5-CTD folds into two discrete subdomains, the core and the 
external ( 7 ig. 2). The core subdomain contains a five-stranded anti¬ 
parallel scaffold center (core-center), which is decorated by five helices 
(a or 3i 0 ) and two small strands (Ppl and Pp2) on the exterior. Three 
pairs of disulfide bonds help to stabilize the scaffold, namely C391- 
C415 and C445-C583, located in the peripheral region of the core 
subdomain (core-peripheral), and C433-C486 in the core-center, 
linking Pc2 and Pc4. Notably, two antiparallel P strands, one of which 
is located in the C-terminus and the other forming the disulfide bond 
with the N-terminus, help to make keep two termini in proximity. In 
addition, traceable electron densities can be observed for two glycosy¬ 
lated modifications at N418 and N495, which form two protrusions at 
the core-peripheral region ( r ig. 2). 

The external subdomain of HKU5-CTD extends out of Pc4 in the 
core-center, sequentially folds into two antiparallel P strands (pi' and 
P2'), an a helix (HI'), and another two antiparallel P strands (P3' and 
P4'), and finally proceed into Pc5. Between pi' and HI', a pair of 
disulfide bonds (C511-C532) is formed to stabilize the external 
structure ( ’ig. 2). 

2.2. Conserved core subdomain and variable external subdomain for 
betaCoVs S protein 

To date, seven structures of CTDs covering all four lineages of 
betaCoVs have been solved. They are HKU1-CTD and MHV-CTD, 
belonging to lineage A (Kirchdoerfer et al., 2016; Walls et al., 2016), 
MERS-RBD/CTD (Lu et al., 2013; Wang et al., 2013), HKU4-RBD/ 
CTD (Wang et al., 2014), and HKU5-CTD grouped in lineage C 
(reported here), and SARS-RBD/CTD (F. Li et al., 2005) and HKU9- 
CTD ( iuang et al., 2016) representing lineages B and D, respectively. 
All seven betaCoV CTD structures display a conserved core subdomain, 
with five antiparallel beta strands and a conserved disulfide bond 
between Pc2 and Pc4 (Fig. 3). 

Despite the different combinations of a helices and P strands, the 
orientations of the secondary structures are conserved among CTDs in 
the core-peripheral region. In addition, two highly conserved disulfide 
bonds exist. One is formed between the N-terminus and the loop/P 
strand extended from Pci, and the other links the C-terminus and the 
loop/P strand proceeding to Pc3. Thus, through the two disulfide 
bonds, both termini are brought into close proximity ( 7 ig. 3). Although 
in the SARS-RBD/CTD electron density at the C-terminus is not clear 
enough to determine the structures (Fig. 3C), two conserved cysteines 
are present, indicating the probability of disulfide bond formation 
(Fig. 1C). 

Opposite to the conserved core subdomain, the external subdomain 
varies considerably among different lineages. In lineage A, the external 
subdomain of MHV-CTD, which was obtained by density-guided 
homology modelling due to its large flexibility and poor quality of the 
density in this region, consists of four P strands and three small helices 
(PDB code: 3CJL) ("ig. 3A). HKU1-CTD is comprised of a large, 
variable loop with three inlaid P strands ( 7 ig. 3B). The absence of clear 


102 


X. Han et al. 


Virology 507 (2017) 101 -109 



1-21 22-361 375-604 853-872 962-977 989-1037 1247-1286 1305-1320 

1287-1304 1321-1352 



-BatCoV HKU9 Lineage D 

-SARS-CoV Lineage B 

* -HCoV HKU1 Lmea 9 e A 

-MERS-CoV 

-BatCoV HKU4 Lineage C 

-BatCoV HKU5 


0.1 


HKU5 CTD 


HI 

JLQJLQ. 


H2 

SLQJl 


pci 


H3 

JULQJUl 


(5c2 


Pc3 


HKU5 CTD E 

A 

s 

P 

R 

G 

EF.IEQATTQE 

C 

DFTPM 

LT 

GT . P 

P 

P 

I Y 

NF 

K 

R 

L 

VFTN 

C 

NYNLT 

K 

LL 

S LFQ 

V 

SE 

FS 

C 

HQ 

VS 

P S S 

L 

ATG 

C 

YSSLTV 

D 

MERS CTD E 

A 

K 

P 

S 

G 

SV.VEQAEGVE 

C 

DFSP L 

LS 

GT . P 

P 

Q 

VY 

NF 

K 

R 

L 

VFTN 

C 

NYNLT 

K 

LL 

SLF S 

7 

ND 

FT 

C 

SQ 

I S 

P AA 

I 

A S N 

C 

Y S S L I L 

D 

HKU4 CTD E 

A 

S 

A 

T 

G 

TF.IEQPNATE 

c 

DFSPM 

LT 

GV . A 

P 

Q 

VY 

NF 

K 

R 

L 

VF SN 

c 

NYNLT 

K 

LL 

S LF A 

V 

DE 

F S 

c 

NG 

I S 

P DS 

I 

ARG 

c 

YSTLTV 

D 

HKU9 CTD R 

A 

• 

Q 

V 

A 

GFVRVTQRGSY 

c 

TPP . Y 

S V 

LQDP 

P 

Q 

P V 

VW 

R 

R 

Y 

ML YD 

c 

VFDFT 

V 

VV 

DSLP 

T 

HQ 

LQ 

c 

YG 

VS 

PRR 

L 

ASM 

c 

Y G S V T L 

3 

SARS CTD R 

V 

V 

P 

S 

G 

DVVRFPNITNL 

c 

P FGE V 

FN 

AT KF 

P 

S 

VY 

AW 

E 

R 

K 

K I SN 

c 

VAD YS 

V 

LY 

NS TF 

F 

ST 

FK 

c 

YG 

VS 

ATK 

L 

ND L 

c 

F SNVYA 

D 

HKU1 CTD T 

V 

K 

P 

V 

A 

TVYRRIPNLPD 

c 

DIDNW 

LN 

NVS V 

P 

s 

PL 

NW 

E 

R 

R 

I F SN 

c 

NFNLS 

T 

LL 

RLVH 

7 

D S 

F S 

c 

NN 

LD 

KSK 

I 

FGS 

c 

FN SI TV 

D 

MHV CTD T 

V 

Q 

P 

V 

G 

VVYRRVANLPA 

c 

NIEEW 

LT 

ARS V 

P 

s 

P L 

NW 

E 

R 

K 

TFQN 

c 

NFNLS 

S 

LL 

RYVQ 

A 

E S 

LF 

c 

NN 

I D 

ASK 

V 

YGR 

c 

IFGS I SVH 


452 

444 

449 

418 

385 

389 

391 


HKU5 CTD 


H4 

SISLQJISLQJL 


TT 


H5 

SULQJLQ. 


TT 


pc4 


TT 


pr 


HKU5 CTD Y 

FA 

YSTDMSSY 

L 

QP 

GS 

A 

G 

A 

I 

VQ 

F 

NY 

K 

QDFSNP 

T 

C 

R 

VL 

A 

TV 

PQ 

N 

LT 

TIT 

KP SNYA 

Y 

MERS CTD Y 

F S 

YPLSMKSD 

L 

S V 

S S 

A 

G 

P 

I 

SQ 

F 

NY 

K 

QSFSNP 

T 

c 

L 

I L 

A 

TV 

PH 

N 

LT 

TIT 

KP LKY S 

Y 

HKU4 CTD Y 

FA 

YPLSMKSY 

I 

RP 

GS 

A 

G 

N 

I 

P L 

Y 

NY 

K 

QSFANP 

T 

c 

R 

VM 

A 

S V 

LA 

N 

V . 

TIT 

KP HAYG 

Y 

HKU9 CTD V 

MR 

INETHLNN 

L 

FN 

RV 

P 

D 

T 

F 

SL 

Y 

NY 

A 

LP DNFY 

G 

c 

L 

HA 

F 

YL 

NS 

T 

AP 

• • • 


Y 

SARS CTD S 

F V 

VKGDDVRQ 

I 

AP 

GQ 

T 

G 

V 

I 

AD 

Y 

NY 

K 

LP DDFM 

G 

c 

V 

LA 

W 

NT 

RN 

I 

DA 

T . . 

. . S T GN 

Y 

HKU1 CTD K 

FA 

IPNRRRDD 

L 

QL 

GS 

S 

G 

F 

L 

QS 

S 

NY 

K 

I D I S S S 

S 

c 

Q 

L Y 

Y 

S L 

P L 

V 

NV 

TIN 

NFNPSS 

W 

MHV CTD K 

FA 

VPRSRQVD 

L 

QL 

GN 

S 

G 

F 

L 

QT 

A 

NY 

K 

I D T A A T 

S 

c 

Q 

LH 

Y 

TL 

PK 

N 

NV 

TIN 

NHNP S S 

w 


LT 
IN 
I S 
AV 
NY 
NR 
NR 


A i 

★ 


A 

★ ★ ★★ 


★ ★ 


AA 

★ ★ 


C.YKT S AYG . . . KN YL 

C.SRLLSDDR . TEVPQ 

C.SRLTGANQDVETPL 

N.RFP 

YRYLRHGKLRP..FERDISNVP 
YGFG...SFNLSSYDVVYSDHC 
YGFNDAGVFGKNQHD VVY A Q Q C 

A A AA A A 

★ ★ ★ ★ ★ ★ 


HKU5 CTD 


TT 


P3’ 


HKU5 CTD 

Y 

N A P G A 

Y 

T 

PC 

MERS CTD 

L 

VNANQ 

Y 

S 

PC 

HKU4 CTD 

Y 

INPGE 

Y 

S 

I C 

HKU9 CTD 

I 

KPGG . 

• 

R 

QS 

SARS CTD 

F 

SPDG . 

• 

K 

PC 

HKU1 CTD 

F 

S VNSD 

F 

C 

PC 

MHV CTD 

F 

T VRS S 

Y 

C 

PC 


. . H S 

P LE G 
QFEG 


HKU5 CTD 
HKU5 CTD 
MERS CTD 
HKU4 CTD 
HKU9 CTD 
SARS CTD 
HKU1 CTD 
MHV CTD 


► 

— 

P VTGN 

LQ 

AMTEQ 

LQ 

PMTDN 

LQ 

Y G L AV 

I T 

Y R VV V 

LS 

PGLG I 

NE 

EGLGV 

LE 


HI’ 

JULQ. . Si tt—. . . 

LSL.ASRGFSTKYQS. 

VSI.VPSTVWEDGDYYRKQLS 

RDF.SPGGFSEDGQVFKRTLT 

NS AF.I DTVINAAH Y. 

TPPA.LNCYWPLNDY. 

ADP SVVNSCAKSKPP SAICPAGTKYRHCDLDTTLYVKNWCRCSCLPDPISTYSP 
AQPDIVSPCTTQTKP. 

AAAAAA A 

★★★★★★ ★ 

pc5 


p4’ 


SP 

GF 


.DGELTTT 

.GGWLVAS 

.GGLLIGV 

F.S . . . . 

Y.TTTGI G 

NTCPQKKVVVGIG 
.KS AF VNV|G 

A A AAA 

★ ★ ★ ★ 


522 

516 

521 

471 

459 

466 

471 


Y 

I 

Y 

559 

S 

T 

V 

561 

T 

RV 

566 


Y 

V 

499 

Y 

Q 

P 

493 

E 

H 

C 

546 

D 

H 

c 

507 


MA 

MG 

MS 


EKCGTQLNHSSCFCSPDAFLGWSFDSCISNNRCNI 
DNCGNADPHKGCICANNSFIGWSHDTCLVNDRCQI 


IIS 

GIT 

IIS 


SNF 
AN I 


VQY 
VQY 
VQY 
LKP 
FEL 
I FN 
LLN 


TD 

TD 

TG 

AG 

NA 

IN 

IN 


S V 

C 

p 

MQALRNDTSIEDKLDV 

CV 

E 

Y 

604 

S V 

c 

p 

KLEFANDTKIASQLGN 

CV 

E 

Y 

606 

S V 

c 

p 

MLDLGDSLTITNRLGK 

CV 

D 

Y 

611 

LV 

c 

p 

V.ANDTVVI TDR 

CV 

Q 

Y 

533 

TV 

c 

G 

P.KLSTDLIKN Q 

CV 

N 

F 

527 

TT 

c 

S 

NDLLYSN..TEISTGV 

CV 

N 

Y 

622 

TT 

c 

S. 

TDLQLPN..TEVVTGI 

CV 

K 

Y 

583 


Fig. 1. Sequence features of HKU5-CTD. (A) Schematic representation of BatCoV HKU5 S. The indicated domain elements were defined through pairwise sequence alignments or 
bioinformatics analyses. The signal peptides (SP), transmembrane domain (TM), and heptad repeats 1 and 2 (HR1 and HR2, respectively) were predicted with the SignalP 4.0 server, 
TMHMM server, and Learncoil-VMF program, respectively, while the NTD, RBD, and fusion peptides (FP, IFP, and PTM) were deduced by alignment with the N-terminal galectin-like 
domain of murine hepatitis virus S, MERS-RBD/CTD, and SARS-CoV S, respectively. The S1/S2 and S2' sites potentially cleaved by furin-like proteases were predicted using the ProP 
1.0 server. (B) Phylogenetic tree generated using MEGA with the indicated RBD/CTD sequences. (C) Structure-based sequence alignment. The secondary structure elements are defined 
based on an ESPript algorithm and are labeled as in Fig. 2. The arrows and spiral line indicate strands and helices, respectively. The conserved cysteine residues that form four disulfide 
bonds in the structures are marked with Arabic numerals 1-4. The beta strands in the core-center, the elements in the core-peripheral, and the structures in the external domains are 
marked with c, p, and the character with prime, respectively. The blue triangle and green star represent the key amino acids for binding hCD26 in MERS-RBD/CTD and HKU4-RBD/ 
CTD, respectively, while the yellow rhombus indicates two glycosylation sites in HKU5-CTD. 


secondary structure from residues C476-F572 indicates the flexibility 
of this region (PDB code: 5108). In lineage C, three CTDs show similar 
external folds, with rmsd ranging from 0.962 (HKU5-CTD vs. MERS- 
RBD/CTD (PDB code: 4KQZ)) to 1.178 (HKU5-CTD vs. HKU4-RBD/ 
CTD (PDB code: 4QZV)). All external subdomains are strand- 
dominated structures with four anti-parallel P strands and expose a 
flat strand-face that is stabilized by a conserved disulfide bond ( r ig. 
3D-F). In lineage B, the SARS-RBD/CTD is dominated by a disulfide 
bond-stabilized flexible loop that connects two small P strands (PDB 


code: 2GHV) ( ug. 3C). In BatCoV HKU9, representing lineage D, the 
external subdomain only contains one large helix in this region (PDB 
code: 5GYQ) (Fig. 3G). 

Although their external subdomain structures differ, all CTDs in 
betaCoVs extend out from Pc4 and proceed back to the core subdomain 
through Pc5 (Fig. 3), indicating that during evolution, different 
insertions in this region resulted in the different structures of the 
CTDs. This, then, led to different receptor usage if the CTD is utilized as 
the RBD. 
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Table 1 

Data collection and refinement statistics. 

HKU5 CTD 

Data collection 


Wavelength (A) 

0.97915 

Space group 

P 21 

Cell dimensions 


a, b, c (A) 

49.612, 212.659, 87.943 

a, (3, y (deg) 

90.000, 94.756, 90.000 

Resolution (A) 

50.00-2.10 (2.18-2.10) a 

-Emerge 

0.104 (1.038) 


0.052 (0.527) 

I / ol 

15.355 (1.645) 

CCi/2 

0.998 (0.801) 

Completeness (%) 

99.9 (99.9) 

Redundancy 

5.0 (4.8) 

Refinement 


Resolution (A) 

37.80-2.10 

No. reflections 

104555 

R\vork / Rfree 

0.2160/0.2585 

No. atoms 


Protein 

11032 

Ligand/ion 

0 

Water 

527 

B-factors 


Protein 

52.9 

Ligand/ion 

- 

Water 

45.2 

R.m.s. deviations 


Bond lengths (A) 

0.006 

Bond angles (deg) 

1.062 

Ramachandran plot 


Favored (%) 

95.73 

Allowed (%) 

3.44 

Outliers (%) 

0.83 


a Values in parentheses are for highest-resolution shell. 


2.3. Structural basis for HKU5-CTD not binding to CD26 

Both MERS-RBD/CTD and HKU4-RBD/CTD bind to hCD26 to 
initiate infection. In addition, the structure of the HKU5-CTD displays 
a similar topology to the two RBD/CTDs in lineage C. Thus, we assayed 
for binding between HKU5-CTD and hCD26. However, consistent with 
previous results, no binding was detected, either by fluorescence- 


activated cell sorting (FACS) or surface plasmon resonance (SPR) 

(Fig. 4). 

According to the two solved complex structures, four concentra¬ 
tions of residues in MERS-RBD/CTD and HKU4-RBD/CTD are 
involved in binding to hCD26. These residues located in four beta 
strands and two loops (Pl'/P2' loop and p3'/P4' loop) in the external 
subdomain as well as H4 and H5 (for MERS-RBD/CTD) or H5 and H6 
(for HKU4-RBD/CTD) positioned in the core subdomain and the loop 
connecting the two helices ( Figs. 1C, 3E and 3F). However, half of these 
regions (Pl'/P2' loop and P3') have deletions in HKU5-CTD (Fig. 1C). 
Due to these deletions, the orientations of two loops (marked 1 and 2, 
respectively, in Ag. 3D-F) in HKU5-CTD vary compared to MERS- 
RBD/CTD and HKU4-RBD/CTD, which leads to conformational shifts 
in HKU5-CTD at the hCD26-binding interface ( fig. 5A and E). The 
pi'/p2' loop in both MERS-RBD/CTD and HKU4-RBD/CTD inserts 
into the groove formed by two helices on the side and P strands on the 
bottom ( Fig. 5B and F). Sixty-five (328 in total) and 49 (214 in total) 
van der Waals contacts, including 5 (16 in total) and 4 (13 in total) 
hydrogen bonds, are formed in MERS-RBD/CTD/hCD26 and HKU4- 
RBD/CTD/hCD26, respectively. In contrast, this loop in HKU5-CTD is 
tilted away by ~6 and 9 A ( 7 ig. 5B and F) compared to MERS-RBD/ 
CTD and HKU4-RBD/CTD, respectively, which results in the loss of 
binding to hCD26 at this region. 

Moreover, a six-residue deletion in P3' causes large discrepancies in 
the assemblies of P3', P4' and their connecting loop, compared with 
MERS-RBD/CTD and HKU4-RBD/CTD (Fig. 3D-F). In P3' of both 
hCD26-binding RBD/CTDs, the side chains of Y540 and R542 face the 
receptor, conferring a strong hydrophilic interaction between the ligand 
and the receptor. In contrast, in HKU5-CTD, the orientation of P3' is 
opposite. In addition, Y544 in HKU5-CTD likely sterically clashes Q286, 
which further pushes HKU5 away from hCD26 ( "ig. 5C and G). In the 
other beta strand of P4', both MERS-RBD/CTD and HKU4-RBD/CTD 
form a large hydrophobic interaction patch with hCD26. In HKU5-CTD, 
aside from the shift of the p3'/P4' loop away from the receptor, the 
deletion of hydrophobic residues (e.p., W535) compared to MERS-RBD/ 
CTD ( Fig. 5D) and the substitution of hydrophilic residues (e.g., T553 and 
T555) instead of hydrophobic ones (1560 and V562 in HKU4-RBD/CTD) 
( 7 ig. 5H) likely inhibit HKU5-CTD binding to hCD26. In total, the 
conformational variations between HKU5-CTD and hCD26-binding 
RBD/CTDs explain the lack of hCD26 binding by HKU5-CTD. 
However, various deletions in HKU5-CTD loop 1 are present in nature 
( fig. 6), and might contribute to evolve for receptor binding. 




core 

subdomain 


external 

subdomain 


Fig. 2. Crystal structure of the HKU5-CTD. The core and external subdomains are colored orange and magenta, respectively. The core subdomain is further divided into a center 
region (core-center) and a peripheral region (core-peripheral). The core-center strands and helices are labeled (3cl-(3c5 and H1-H5, respectively, while the core-peripheral strands are 
marked (3pl and (3p2. The glycan-moieties are displayed in sticks and marked in the left panel. The disulfide bonds are presented in spheres and labeled in the right panel. Both N- and C- 
terminus are indicated with the arrows. To depict the structure clearly, cartoon structures inside the transparent surface are presented at two angles. 
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Fig. 3. Topological diagrams of CTDs in betaCoVs. Structural and topological comparison of available betaCoV CTD structures. Seven structures, including those of MHV-CTD 
(PDB code: 3CJL), HKU1-CTD (PDB code: 5108), SARS-RBD/CTD (PDB code: 2GHV), HKU5-CTD, HKU4-RBD/CTD (PDB code: 4QZV), MERS-RBD/CTD (PDB code: 4KQZ), and 
HKU9-CTD (PDB code: 5GYQ) were oriented similarly and are presented as cartoons in parallel. The conserved disulfide bonds are labeled in red lines, while the non-conserved ones are 
displayed with lines in accordance with the color of indicated external subdomain. Arrows and cylinders represent the strands and helices, respectively. 


3. Discussion 

In this study, we solved the crystal structure of HKU5-CTD, which 
represents the seventh structure of a CTD belonging to a betaCoV. Like 
the other six CTDs (Huang et al., 2016; Kirchdoerfer et al., 2016; F. Li 
et al., 2005; Lu et al., 2013; Peng et al., 2011; Walls et al., 2016; Wang 
et al., 2013, 2014), there are two subdomains in HKU5-CTD, the core 
and the external. Despite the low residue conservation among CTDs 
(pair-to-pair amino acid identity ranging from 17.2% to 58.7%) and the 
core subdomains (pair-to-pair amino acid identity ranging from 16.6% 
to 66.7%) in the four lineages, the topology of the latter ones are highly 
conserved, with five anti-parallel P strands constituting the core-center 
and the same orientation of secondary elements in the core-peripheral 
( Fig. 3). This includes the same region of MHV, which uses the NTD of 
SI to bind the receptor. 

However, the external subdomains vary considerably among 
lineages. In lineage A, the MHV-CTD contains several P strands and 
inlaid helices, while the HKU1-CTD is comprised of loops and three 
small P strands. However, approximately 100 amino acids (C476-F572) 


are unclear at this region, likely due to their flexibility. SARS-RBD/ 
CTD, in lineage B is dominated by loops, which are stabilized by a 
disulfide bond and two anti-parallel P strands. Most CTD structures 
solved to date are in lineage C, and all three CTDs (MERS-RBD/CTD, 
HKU4-RBD/CTD, and HKU5-CTD) display conserved structures with 
P strand-forming platforms decorated with helices. In addition, a 
disulfide bond is conserved among CTDs in lineage C in the external 
subdomain. HKU9-CTD, a member of lineage D, is comprised of a helix 
that is clamped with loops. Although different structures and topolo¬ 
gies exist among lineages, all external subdomains extend out from Pc4 
and proceed back to Pc5 ( fig. 3), indicating that different insertions 
between Pc4/Pc5 during betaCoV evolution have conferred the 
betaCoVs with different properties, such as receptor usage, and thereby 
led to the parallel evolution of lineages. 

The ligand-receptor interaction is a key factor determining the 
tissue tropism and host range of CoVs. For SARS-CoV, MERS-CoV, and 
BatCoV HKU4, the receptors are clear, and the complex structures 
demonstrate that the receptor mainly binds to the varied external 
subdomains of CTDs. Neutralizing antibodies against HCoV HKU1 
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Fig. 4. Characterization of HKU5-CTD by FACS and SPR. (A) Huh7 cells were stained with MERS-RBD/CTD (green), HKU4-RBD/CTD (orange), and HKU5-CTD (red), 
respectively. (B-D) BHK21 cells transfected with hCD26 (BHK-hCD26) were stained with MERS-RBD/CTD (B), HKU4-RBD/CTD (C), and HKU5-CTD (D), respectively. (E-F) The 
indicated protein was immobilized on a CM5 chip, and a gradient concentration of hCD26 was flowed through the chip. The RUs were recorded. (E) hCD26 binding to HKU5-CTD. (F) 
hCD26 binding to MERS-RBD/CTD. (G) hCD26 binding to HKU4-RBD/CTD. (H) The saturation profile for HKU4-RBD/CTD binding to hCD26. 
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Fig. 5. Structural basis for the lack of binding between HKU5-CTD and hCD26. Superimposition of the structures of HKU5-CTD and hCD26 binding-MERS-RBD/CTD (A-D) 
or HKU4-RBD/CTD (E-H). The variations in the receptor binding interface of HKU5-CTD compared with MERS-RBD/CTD or HKU4-RBD/CTD are allocated with B-D and F-H and 
further delineated in B-D and F-H for detailed structural shifts, respectively. The conserved core subdomains in HKU5-CTD, MERS-RBD/CTD, and HKU4-RBD/CTD are colored in 
grey, while the external subdomains of the three proteins are marked with orange, cyan, and wheat, respectively. The magenta represents hCD26. 


bind to the HKU1-CTD, not the HKU1-NTD (Qian et al., 2015), 
indicating that the CTD in HCoV HKU1 is most likely to be the RBD, 
though the protein receptor has yet to be identified (Chan et al., 2016; 
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Huang et al., 2015). HKU9-CTD does not bind to ACE2 or hCD26 
( iuang et al., 2016). In our study, we found that although HKU5-CTD 
displays a similar structure and topology to MERS-RBD/CTD and 







































X. Han et al. 


Virology 507 (2017) 101 -109 


MERS 

HKU4 

KC522097.1 
KC522091.1 
KC522104.1 
KC522094.1 
KC522100.1 
KC522098.1 
KC522092.1 
KC522099.1 
EF065512.1 
EF065511.1 
EF065510.1 
KC522102.1 
KC522101.1 
KC522095.1 
KC522103.1 
KC522096.1 
HKU5 

EF065509.1 
KC522093.1 
KC522090.1 



AE 
PN 
S T 
ST 
AT 
S V 
S V 
S V 
S V 
S V 
AT 
AT 
AT 
TT 
AT 
AT 
AT 
AV 
AT 
AT 
AT 
AT 


ECDF 

SP L 

ECDF 

SPM 

ECDF 

TKI 

ECDF 

T KM 

ECDF 

TPM 

ECDF 

SPM 

ECDF 

SPM 

ECDF 

SPM 

ECDF 

SPM 

ECDF 

SPM 

ECDF 

TPM 

ECDF 

TPM 

ECDF 

TPM 

ECDF 

TPM 

ECDF 

TPM 

ECDF 

TPM 

ECDF 

TPM 

ECDF 

TPM 

ECDF 

TPM 

ECDF 

TPM 

ECDF 

TPM 

ECDF 

TPM 


TPQP I 


FgRgVF 

fHrWvf 

FgRgvF 

FgRgvF 

fBr Jvf 
f[9r Jvf 
fRr Jvf 
fBr Jvf 
fRr Jvf 
fRr Ivf 
fHr Jvf 
FgRJvF 
fHr Jvf 

FgRJvF 
fBr Ivf 
f|9r Jvf 
fBr Jvf 
fRr Jvf 
fBr Jvf 
fHr Jvf 
f|3r Jvf 
f0r jvf 


NCNYNL 

NCNYNL 

NCNYNL 

NCNYNL 

NCNYNL 

NCNYNL 

NCNYNL 

NCNYNL 

NCNYNL 

NCNYNL 

NCNYNL 

NCNYNL 

NCNYNL 

NCNYNL 

NCNYNL 

NCNYNL 

NCNYNL 

NCNYNL 

NCNYNL 

NCNYNL 

NCNYNL 

NCNYNL 


TKL 

TKL 

TKL 

TKL 

TKL 

TKL 

TKL 

TKL 

TKL 

TKL 

TKL 

TKL 

TKL 

TKL 

TKL 

TKL 

TKL 

TKL 

TKL 

TKL 

TKL 

TKL 


LgLF 

LgLF 

lBlf 

l|lf 

LgLF 

LgLF 

LgLF 

LgLF 

LgLF 

LgLF 

LgLF 

LgLF 

LgLF 

L0LF 

LgLF 

LgLF 

LgLF 

LgLF 

LgLF 

LgLF 

LgLF 

LHLF 




Y 

Y 
S 

]S 

Y 

Y 

Y 

Y 

Y 

Y 

Y 

Y 

Y 

Y 

Y 

Y 

Y 

Y 

Y 

Y 

Y 

Y 


S 

A 

A 

A 

S 

A 

A 

A 

A 

A 

A 

A 

A 

A 

A 

A 

A 

A 

A 

A 

A 

A 


447 

452 

455 

455 

456 
456 
455 

455 

456 

455 

456 
456 
456 
456 
456 
456 
456 
456 
455 

455 

456 
455 


MERS 

HKU4 

KC522097.1 
KC522091.1 
KC522104.1 
KC522094.1 
KC522100.1 
KC522098.1 
KC522092.1 
KC522099.1 
EF065512.1 
EF065511.1 
EF065510.1 
KC522102.1 
KC522101.1 
KC522095.1 
KC522103.1 
KC522096.1 
HKU5 

EF065509.1 
KC522093.1 
KC522090.1 


YPL 

YPL 

FPL 

FPL 

YPT 

YST 

YST 

YST 

YST 

YST 

YST 

YST 

YST 

YSS 

YST 

YST 

YST 

YST 

YST 

YST 

YST 

YST 


SMK 
SMK 
ELS 
ELS 
SMS 
DM S 
DMS 
DM S 
DMS 
DMS 
DMS 
DMS 
DMS 
DMS 
DMS 
DMS 
DMS 
DMS 
DMS 
DMS 
DMS 
DMS 


S 

DLS VS 

SAG 

S 

YIRPG 

SAG 

s 

YMQTG 

SAG 

s 

YMQTG 

SAG 

s 

YLQPG 

SAG 

s 

YLQPG 

SAG 

s 

YLQPG 

SAG 

s 

YLQPG 

SAG 

s 

YLQPG 

SAG 

s 

YLQPG 

SAG 

s 

YLQPG 

SAG 

s 

YLQPG 

SAG 

s 

YLQPG 

SAG 

s 

YLQPG 

SAG 

s 

YLQQG 

SAG 

s 

YLQPG 

SAG 

s 

YLQPG 

SAG 

s 

YLQPG 

SAG 

s 

YLQPG 

SAG 

s 

YLQPG 

SAG 

s 

YLQPG 

SAG 

s 

YLQPG 

SAG 


P 

M 

P 

P 

E 

E 

E 

E 

E 

E 

E 

E 

E 

E 

E 

E 

E 

E 

A 

A 

E 

E 


SQF 

PLY 

RQF 

RQF 

VQF 

VQF 

VQF 

VQF 

VQF 

VQF 

VQF 

VQF 

VQF 

VQF 

VQF 

VQF 

VQF 

VQF 

VQF 

VQF 

VQF 

VQF 


N YgQgF 
NYgQgF 
NYgQgF 
NYgQgF 
NYgQgF 
NYgQgF 
NYgQgF 
NYgQgF 
NYgQgF 
NYgQgF 
NYgQgF 
NYgQgF 
NYgQgF 
NYgQgF 
NYgQgF 
NYgQgF 
N YgQgF 
NYgQgF 
NYgQgF 
NYgQgF 
NYgQgF 

nyBqBf 


SN 

AN 

SK 

SK 

SN 

SN 

SN 

SN 

SN 

SN 

SN 

SN 

SN 

SN 

SN 

SS 

SN 

SN 

SN 

SN 

SN 

SN 


33* 

L I L 

PTC 

RVM 

PTC 

RVL 

PTC 

RVL 

PTC 

RVL 

PTC 

RVL 

PTC 

RVL 

PTC 

RVL 

PTC 

RVL 

PTC 

RVL 

PTC 

RVL 

PTC 

RVL 

PTC 

RVL 

PTC 

RVL 

PTC 

RVL 

PTC 

RVL 

PTC 

RVL 

PTC 

RVL 

PTC 

RVL 

PTC 

RVL 

PTC 

RVL 

PTC 

RVL 



_ 

1 

1 

_ 

I 

N 

K 

C 

S 

R 

L 

L 

S 

D 

e 

. RT 

E 

V 

P 

Q 

L 

V 

N 

AN 

Q 

S 

P 

C 

V 

I 

S 

K 

C 

s 

R 

L 

T 

G 

A 

N 

QDV 

E 

T 

P 

L 

Y 

I 

N 

P 

G 

E 

S 

I 

C 

R 

L 

T 

E 

c 

Y 

T 

Q 

G 

H 

S 

S 



?•: 

Q 

Y 

H 

N 

T 

Q 

P 

G 

E 

T 

P 

c 

L 

L 

T 

E 

c 

Y 

T 

L 

G 

Y 

S 

A 



K 

Q 

Y 

H 

N 

T 

C 

P 

G 

Q 

T 

P 

c 

L 

L 

T 

E 

c 

Y 

K 

F 

T 

A 

Y 

G 



K 

N 

Y 

L 

Y 

N 

A 

P 

G 

G 

T 

P 

c 

L 

L 

T 

E 

c 

Y 

T 

S 

T 

A 

Y 

G 



K 

N 

Y 

L 

F 

N 

A 

P 

G 

G 

T 

P 

c 

L 

L 

T 

E 

c 

Y 

K 

S 

T 

A 

Y 

G 




M 

Y 

L 

F 

N 


P 

G 

G 

T 

P 

c 

L 

L 

T 

E 

c 

Y 

K 

s 

T 

A 

Y 

G 



•' 

N 

Y 

L 

F 

N 

A 

P 

G 

G 

T 

P 

c 

L 

L 

T 

E 

c 

Y 

K 

s 

T 

A 

Y 

G 



K 

>1 

Y 

L 

F 

N 

A 

P 

G 

G 

T 

P 

c 

L 

L 

T 

E 

c 

Y 

K 

G 

T 

A 

Y 

G 



K 

N 

Y 

L 

F 

N 

A 

P 

G 

G 

T 

P 

c 

L 

L 

T 

E 

c 

Y 

K 

s 

T 

A 

Y 

G 



K 

N 

Y 

L 

Y 

N 

A 

P 

G 

G 

T 

P 

c 

L 

L 

T 

E 

c 

Y 

K 

S 

T 

A 

Y 

G 



K 

N 

Y 

L 

Y 

N 

A 

P 

G 

G 

T 

P 

c 

L 

L 

T 

E 

c 

Y 

K 

S 

T 

A 

V 

G 



K 

N 

Y 

L 

Y 

N 

A 

P 

G 

G 

T 

P 

c 

L 

L 

T 

E 

c 

Y 

K 

T 

T 

E 

Y 

G 



K 

N 

Y 

L 

Y 

N 

A 

P 

G 

G 

T 

P 

c 

L 

L 

T 

E 

c 

Y 

K 

I 

S 

A 

Y 

G 



K 

N 

Y 

L 

Y 

N 

A 

P 

G 

G 

T 

P 

c 

L 

L 

T 

E 

c 

Y 

K 

T 

S 

A 

Y 

G 



K 

N 

Y 

L 

Y 

N 

A 

P 

G 

G 

T 

P 

c 

L 

L 

T 

E 

c 

Y 

K 

T 

s 

A 

Y 

G 



K 

I'] 

Y 

L 

Y 

N 

A 

P 

G 

G 

T 

P 

c 

L 

L 

T 

E 

c 

Y 

K 

T 

s 

A 

Y 

G 




N 

Y 

L 

Y 

N 

A 

P 

G 

G 

T 

P 

c 

L 

L 

T 

E 

c 

Y 

K 

T 

s 

A 

Y 

G 



k 

N 

Y 

L 

Y 

II 

A 

P 

G 

A 

T 

P 

c 

L 

L 

T 

E 

c 

Y 

K 

T 

s 

A 

Y 

G 



K 

N 

Y 

L 

Y 

N 

A 

P 

G 

A 

T 

P 

c 

L 

L 

T 

E 

c 

Y 

K 

T 

s 

A 

Y 

G 



K 

N 

Y 

L 

Y 

N 

A 

P 

G 

G 

T 

P 

c 

L 

L 

T 

E 

c 

Y 

K 

T 

s 

A 

Y 

G 



K 

N 

Y 

L 

Y 

N 

A 

P 

G 

G 

T 

P 

c 

L 






527 

532 

533 

533 

534 
534 
533 

533 

534 

533 

534 
534 
534 
534 
534 
534 
534 
534 
533 

533 

534 
533 


MERS 

HKU4 

KC522097.1 
KC522091.1 
KC522104.1 
KC522094.1 
KC522100.1 
KC522098.1 
KC522092.1 
KC522099.1 
EF065512.1 
EF065511.1 
EF065510.1 
KC522102.1 
KC522101.1 
KC522095.1 
KC522103.1 
KC522096.1 
HKU5 

EF065509.1 
KC522093.1 
KC522090.1 


S I VP 
DFSP 
Q L AN 
Q L AN 
SLAS 
SLAS 
SLAS 
SLAS 
SLAS 
SLAS 
S LAS 
SLAS 
SLAS 
SLAS 
SLAS 
SLAS 
SLAS 
SLAS 
SLAS 
SLAS 
SLAS 
S LAS 


TVW 

GFS 

GFS 

GFS 

GFS 

GFS 

GFS 

GFS 

GFS 

GFS 

GFT 

GFT 

GFT 

GFS 

GFS 

GFS 

GFS 

GFS 

GFS 

GFS 

GFS 

GFS 


EDG 
EDG 
RSG 
RSG 
RDR 
RDR 
RDR 
RDR 
RDR 
RDR 
TNR 
TNR 
TNR 
NK Y 
NKH 
TK Y 
TKY 
TK Y 
TKY 
TKY 
TKY 
TKY 






1 

D Y YR 

KQLSPLE 

GG 

GW 

LVAS 

G 

QVFK 

RTLTQFE 

GG 

GL 

LIGV 

G 

QSNS 

VE...LP 

SR 

EF 

K T AT 

G 

QSNS 

VE...LP 

SG 

EF 

K T AT 

G 

QSHS 

QE...LP 

DG 

SF 

LTTT 

G 

QSHS 

QE...LP 

DG 

SF 

LTTT 

G 

QSHS 

QE...LP 

DG 

SF 

LTTT 

G 

QSHS 

QE...LP 

DG 

SF 

LTTT 

G 

QSHS 

QE...LP 

DG 

SF 

LTTT 

G 

QSHS 

QE...LP 

DG 

SF 

LTTT 

G 

QSHS 

LE...LP 

DG 

Y . 

LVTT 

G 

QSHS 

LE...LP 

DG 

Y . 

LVTT 

G 

QSHS 

LE...LP 

DG 

Y . 

LVTT 

G 

QSHS 


DG 

E . 

LTTT 

G 

QSHS 


DG 

N . 

LTTT 

G 

QSHS 


DG 

E . 

LTTT 

G 

QSHS 


DG 

E . 

LTTT 

G 

QSHS 


DG 

E . 

LTTT 

G 

QSHS 


DG 

E . 

LTTT 

G 

QSHS 


DG 

E . 

LTTT 

G 

QSHS 


DG 

E . 

LTTT 

G 

QSHS 


DG 

E . 

LTTT 

G 







TVA 
R V|P 
I I 
I I 
VY 
VY 
VY 
VY 
VY 
VY 
VY 
VY 
VY 
I Y 
I Y 
VY 
I Y 
I Y 
I Y 
I Y 
I Y 
I Y 


M 

T 

M 

T 

M 

T 

M 

T 

V 

G 

V 

G 

V 

G 

V 

G 

VG 

V 

G 

V 

N 

VN 

VN 

VN 

VN 

V 

T 

V 

T 

V 

N 

V 

T 

V 

T 

V 

T 

V 

T 


QL 
NL 
QL 
QL 
NF 
NL 
N L 
NL 
NL 
NL 
NL 
NL 
NL 
NL 
NL 
NL 
NL 
N L 
NL 
N L 
NL 
N L 



606 

611 

609 

609 

610 
610 
609 

609 

610 
609 
609 
609 
609 
605 
605 
605 
605 
605 
604 

604 

605 
604 


Fig. 6. CTDs of HKU5 show diversities. All referred sequences in HKU5 CTD regions were analyzed by alignment. The two black boxes indicate the sequences of loops 1 and 2 
marked in Fig. 3C. 


HKU4-RBD/CTD, detailed structural analysis revealed variations at the 
hCD26-binding interface, which results in the loss of binding to this 
receptor. Thus, subtle distinctions in external subdomains could 
determine different receptor usage. 

In addition to receptor binding, the priming process, which involves 
host proteases to liberate S2 and the fusion peptides from the otherwise 
covalently-linked SI subunit, is another key factor affecting cell 
tropism and the entry route of CoVs. A two-step activation mechanism 
has been proposed for MERS-CoV entry (Millet and Whittaker, 2014 ). 
During the secretion of S protein, the proteolysis at S1/S2 occurs in the 
endoplasmic reticulum (ER)-Golgi compartments where furin is loca¬ 
lized, while during virus entry into target cells, S2' is cleaved. The same 
proteolysis in S1/S2 and S2' is also essential for SARS-CoV infection, 


except that due to the lack of a furin-recognition site at S1/S2, SARS- 
CoV S remains uncleaved after biosynthesis and a diverse array of 
proteases are involved in this process (Millet and Whittaker, 2015 ; 
Simmons et al., 2011 ). In contrast, although BatCoV HKU4 can utilize 
hCD26 as a receptor, the proteolysis is stuck due to the lack of a 
protease site. Treatment of pseudovirus particles containing BatCoV 
HKU4 S protein with trypsin or importing the furin-recognition site 
into S protein enables the particles to infect hCD26-expressing cells 
(Wang et al., 2014 ; Yang et al., 2015 ), indicating the BatCoV HKU4 is 
less adapted to human cells. However, in the BatCoV HKU5 S protein, 
both furin-recognition sites are present, as in MERS-CoV. Accordingly, 
BatCoV HKU5 S is predicted to be cleaved at S1/S2 after biosynthesis 
and at S2' during virus entry. Furin is a ubiquitous proteinase and 
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expressed in nearly all cells lines. The presence of the two furin- 
recognition sites indicates that the priming process of BatCoV HKU5 is 
ready to occur. 

BatCoV HKU5 has been circulating in bats (Lau et al., 2013). In an 
epidemiology study over a 7-year period (April 2005 to August 2012), 
25% of alimentary specimens of Japanese pipistrelle bats ( Pipistrellus 
abramus ) collected from 13 locations in Hong Kong were positive for 
this virus (Lau et al., 2013), indicating that it might target gastro¬ 
intestinal tissues. However, BatCoV HKU5 virus has not been isolated 
and cultured successfully, which is an obstacle to virus transmission 
research. The problem is largely due to a lack of suitable cell lines for 
BatCoV HKU5 virus. The emerging but puzzling question is whether 
this virus could infect humans or not. 

An infectious clone of BatCoV HKU5 containing the ectodomain 
from the SARS-CoV S protein was constructed through reverse genetics 
and synthetic-genome design, and the recombinant virus replicates 
efficiently in cell culture and in young and aged mice (Agnihothram 
et al., 2014). In addition, the key proteins for virus replication, such as 
the 3C-like protease, polymerase, and exonuclease of BatCoV HKU5 
display high amino acid sequence similarity to those in MERS-CoV, 
indicating that once the genome of BatCoV HKU5 is released into host 
cells, genome replication, virus particle assembly, and release can 
readily occur. Therefore, the receptor would be the last barrier for 
BatCoV HKU5 to infect humans. Our data show that BatCoV HKU5- 
CTD does not use hCD26 as a receptor, though it folds into a very 
similar structure as MERS-RBD/CTD and HKU4-RBD/CTD. In other 
words, the cellular receptor of BatCoV HKU5 is still a mystery that 
requires further study. 

Evolutionally, BatCoV HKU5 S protein is more diverse than BatCoV 
HKU4 S protein (Lau et al., 2013), and various deletions in loop 1 
( 'ig. 3D and ? ig. 6) have been sequenced. This indicates that BatCoV 
HKU5 is able to generate variants to occupy new ecological niches and 
might acquire the ability to bind to hCD26 by accumulating mutations 
and ultimately cause human respiratory infections like MERS-CoV and 
SARS-CoV. Accordingly, it is very important to perform long-lasting 
surveillance of BatCoV HKU5 evolution, especially the variety of S 
protein in the event that the virus breaks the inter-species and/or inter¬ 
tissue transmission barriers. 


4. Materials and methods 

4.1. Gene construction, protein expression, and purification 

The coding region for HKU5-CTD (Q389-Q586) with a 6xHis-tag at 
its C-terminus was cloned into the EcoRI and Xhol sites of pFastBac. 
HKU5-CTD, MERS-RBD/CTD, HKU4-RBD/CTD, and hCD26 were 
expressed and purified as previously reported (Lu et al., 2013; Wang 
et al., 2014). Briefly, the proteins were expressed in baculovirus- 
infected Hi5 cells. After 48 h of culturing, the supernatants were 
collected and purified through a 5 mL HisTrap HP column (GE 
Healthcare). The bound protein was eluted by a gradient of imidazole. 
Fractions containing the target protein as determined by SDS-PAGE 
were pooled and further subjected to gel filtration using a Superdex @ 75 
column (GE Healthcare) in a buffer composed of 20 mM Tris-HCl (pH 
8.0) and 50 mM NaCl. 

The Fc fusion protein used for cell staining was purified following a 
previously published method (Lu et al., 2013; Wang et al., 2014). In 
brief, the plasmid containing the target gene was transfected into 
HEK293T cells. After 3 and 7 d of culturing, the supernatants 
containing secreted protein were pooled and subjected to HiTrap 
ProteinA chromatography (5 mL, GE Healthcare). Protein was eluted 
with 0.1 M sodium citrate (pH 4.5) and further purified by gel 
filtration. The protein was finally buffer-exchanged into PBS, concen¬ 
trated to ~1 mg/mL, and stored at -80 °C before further usage. 
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4.2. Crystallization, data collection, and structure determination 

The HKU5-CTD protein expressed in Hi5 cells was crystallized by 
sitting drop vapor diffusion at 18 °C. Diffractable crystals were obtained in 
a condition consisting of 0.2 M potassium thiocyanate and 20% (w/v) 
polyethylene glycol 3350 with a protein concentration of 15 mg/mL. 
Crystals were cryoprotected in reservoir solution containing 20% (v/v) 
glycerol and flash-frozen in liquid nitrogen. Diffraction data were collected 
at Shanghai Synchrotron Radiation Facility (SSRF) BL17U and processed 
with HKL2000 (Otwinowski and Minor, 1997). 

The HKU5-CTD structure was solved by the molecular replacement 
method using Phaser (McCoy et al., 2007) from the CCP4 program suite 
(Winn et al., 2011) with the structure of HKU4-RBD/CTD (PDB: 5GJ4) as 
the search model. Initial restrained rigid-body refinement and manual 
model building were performed using REFMAC5 (Murshudov et al., 
2011) and COOT (Debreczeni and Emsley, 2012), respectively. Further 
refinement was performed using Phenix (Adams et al., 2010). Final 
statistics for data collection and structure refinement are represented in 
Table 1. Atomic coordinates and structure factors have been deposited in 
the Protein Data Bank with accession code 5XGR. 

4.3. SPR analysis 

The BIAcore experiments were performed at 25 °C using a BIAcore 
3000 machine with CM5 chips (GE Healthcare). All proteins used in the 
experiment were expressed in insect cells. After purification, the protein 
was exchanged into PBS (pH7.4) containing 0.005% (v/v) Tween 20. The 
MERS-RBD/CTD, HKU4-RBD/CTD, and HKU5-CTD proteins were im¬ 
mobilized on the chip at -1000 response units (RUs), respectively. Gradient 
concentrations of hCD26 (ranging from 0.195 to 200 pM) were then 
injected at 30 pL/min. After each cycle, the sensor surface was re-generated 
using 7 pL of 10 mM NaOH. Measurements from the reference flow cell 
(immobilized with BSA) were subtracted from experimental values. The 
binding kinetics were analyzed with BIAevaluation Version 4.1 software 
using the 1:1 Langumuir binding and/or the steady state affinity models. 

4.4. Flow cytometry assay 

Human hepatocellular carcinoma Huh7 or BHK-21 cells trans¬ 
fected with hCD26 (BHK-hCD26) were used for the binding test. Cells 
were stained with mouse Fc-fusion proteins, including MERS-RBD/ 
CTD, HKU4-RBD/CTD, and HKU5-CTD, expressed using HEK 293T 
cells. BHK-21 cells were transfected with hCD26-containing plasmids 
24 h before staining. Huh7 and BHK-hCD26 were suspended in PBS 
and incubated with individual proteins at 4 °C for 30 min, then washed 
and further incubated at 4 °C for another 30 min with an anti-mlgG/ 
PE antibody. After washing, the cells were analyzed using a BD 
FACSCalibur. The data were processed with FlowJo software. 
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